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PICTURE MASKING AND COMPOSITING IN THE FREQUENCY DOMAIN 
BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention relates to video processing systems, and, in particular, to apparatuses 
and methods for performing picture masking and compositing in the DCT domain. 

Description of the Related Art 

Computer systems are frequently used to perform various types of video or image 
processing, such as picture masking and compositing. In masking, a specified fraction of certain 
pixels of a first image are retained in a new image, according to a provided mask. In compositing, 
pixels of two input images are combined or "blended" at a certain ratio, to form a new image. 

Such masking and compositing are important operations, for example in commercial video 
or image processing. For example, commonly used effects such as chroma keying, wipe, and 
overlaying are based on compositing pictures from two video sources. Masking and compositing 
are also frequently used in production of still images, for example, when generating magazine 
advertisements and posters. 

Computer systems are also used for various data encoding purposes, such as video 
compression. Many video compression standards (including JPEG, MPEG-1 , MPEG-2, H.261 , and 
H.263) are based on the discrete cosine transform (DCT), it may be desirable to process compressed 
pictures in the DCT domain. However, image processing techniques like masking and compositing 
are typically designed to operate in the spatial domain, not the frequency, or DCT, domain. Thus, 
if the image processing of compressed video signals is done in the spatial domain, the input 
compressed video signals must be transformed into the spatial domain before being processed, and 
the processed signal must be transformed back into the DCT domain once more. Such 
transformation to the spatial domain and back into the frequency domain can be very 
computationally expensive and, therefore, undesirable. Moreover, conventional "brute force" 
convolutions performed directly in the frequency domain are also extremely computationally 
p-xoensive. 
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SUMMARY 

In the present invention, at least one image signal and a mask signal are received, wherein 
the image signal and mask signal are in the DCT domain. Masking of the image signal is performed 
in the DCT domain, in accordance with the mask signal, by representing the masking in terms of the 
5 DCT basis functions, to provide an output image signal. 

BRIEF DESCRIPTION O F THE DRAWINGS 

Fig. 1 shows a prior art spatial domain image processing system; 

Fig. 2 is a block diagram of a DCT domain image processing system, in accordance with a 
preferred embodiment of the present invention; and 

10 Fig. 3 depicts an exemplary processed image processed by the DCT domain image 

processing system of Fig. 2 . 
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DESCRIPTION OF THE PREFERRED EMBODIMENT 

As explained above, performing image processing techniques like masking and compositing 
in the spatial domain offers drawbacks when processing compressed video signals, which are in a 
frequency domain such as the DCT domain. Accordingly, in the present invention, there is 
5 provided an efficient method and associated apparatus for implementing picture masking and 
compositing in the DCT domain. As described in further detail below, the technique of the present 
invention is based on representing the masking function in terms of the DCT basis functions and 
computing the masking as a weighted sum of the results of masking by the DCT basis functions. 

In the DCT domain, masking by the DCT basis functions has a relatively simple and 
10 efficient implementation. Because of the energy compaction property of the DCT, the weight of 
many of the functions is very small and can be dropped from the weighted sum without introducing 
noticeable artifacts. This leads to very efficient implementations for masking and compositing 
images in the DCT domain, typically requiring less than three multiplications per pixel. These and 
other features and advantages of the present invention are described in further detail below. 

15 Spatial Domain Processing of DCT Images 

Referring now to Fig. 1 , there is shown a prior art spatial domain image processing system 
100. As illustrated, spatial domain image processing system 100 includes three inverse DCT 
(IDCT) functional blocks 120, 121, 122, and a DCT functional block 130, as well as spatial domain 
processing functional block 110. As will be appreciated by those skilled in the art, each of these 
20 functional blocks may be implemented in hardware or software. For example, the IDCT and DCT 
operations of blocks 120, 121, 122, and 130, respectively, as well as the spatial domain processing 
of block 1 10, may be performed by a suitably programmed general-purpose or special-purpose 
microprocessor. 

System 100 receives as input signals the mask signal and image signals x 0 and x h each ^f 
25 which are in the DCT domain. For example, image signals x 0 and x, may have been previously 
compressed with a process that utilizes the DCT. System 100 outputs output image signal^, which 
represents the compositing of image signals x 0 and x, in accordance with the mask signal. Output 
image signal y is also in the DCT domain. Since block 1 10 performs image processing in the spatial 
domain (e.g., with RGB or YUV spatial representations of image pixels), IDCT blocks 120, 121, 
30 and 123 are necessary in prior art svstcms to transform the input signals into the spatial domain. 
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Once the (spatial domain) input signals are processed, the processed output signal must be 
transformed back into the DCT domain, to provide signal y. 

As will be appreciated, it is trivial for spatial domain processing unit 110 to implement 
spatial masking in the spatial domain by using spatial windowing. For an input picture x[m,n], 
5 masking (also referred to as windowing) with the mask, or window, w[m,n], is simply 

y[m,n] = w[m y n] x x [m y n] (1) 

As will be appreciated, windowing in the spatial domain is equivalent to convolution in the 
frequency domain. The masking in (1) can, therefore, be implemented by DCT processing of DCT 
signals as 

W] = W[Kl] * X[Kl] (2) 

where Xfcl], Yfkl], and W/Jfc// are the frequency representations of x[m t n] y y[m,n] y and w[m,n] t 
10 respectively, * is the convolution operator, m t n are the spatial domain indices, and k, I are the DCT 
or frequency domain indices. The approach in (2) is a "brute force" DCT domain processing 
implementation based on symmetric convolution. As will be appreciated, a symmetric convolution 
is achieved by making a symmetric extension of two finite length signals and the convolving the 
extended signals together using circular convolution. If the frequency domain is the discrete 
15 Fourier transform (DFT) domain, the convolution in (2) is circular convolution. Further 
background on such techniques may be found in D.E. Dudgeon & R.M. Mersereau, 
Multidimensional Digital Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1984. If the 
frequency domain is the DCT domain (or other discrete trigonometric transforms), the convolution 
in (2) is a symmetric convolution. Further background on symmetric convolutions may be found 
20 in S.A. Martucci, Symmetric Convolution and the Discrete Sine and Cosine Transforms: Principles 
and Applications, PhD thesis, Georgia Institute of Technology, 1993. Spatial masking in the DCT 
domain can, therefore, be implemented by using symmetric convolution according to (2). 

Masking can be used to implement compositing of two input pictures x 0 [n t m] and x t [n t m] 
according to 

✓ 

y[m,n] = x x [m 9 n] +w[m,n]x(x 0 [m,n]-xjm,n]) (3 ) 
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In this case a mask value of one means that samples from x Q [n,m] are used, while a mask value of 
zero means that samples from x t [n,m] are used. Mask values in the range from zero to one imply 
linear interpolation between the two signals x 0 and x f . (Mask values outside the unit interval imply 
linear extrapolation of the two input pictures.) 

5 If the mask, w[m,n], is a separable signal, the convolution in (2) can be implemented as two 

separate one-dimensional (1-D) convolutions. As will be appreciated, a 2-D signal x[m t n] is 
separable if there exist two 1-D signals r[m] and sfnj such that x[m t n] = rfmjsfnj (i.e., it can be 
implemented as a cascade of horizontal and vertical DCTs). In the separable case, the convolution 
may provide a reasonable approach to masking, since it requires, for example, only 16 
10 multiplications per sample for an 8x8 DCT. For non-separable masks, however, the convolution 
approach to masking is not as feasible since, for example, masking for an 8x8 block DCT requires 
64 multiplications and considerable data shuffling. 

Accordingly, both spatial domain processing of DCT images and the brute force DCT 
approach often require an undesirably high amount of computation. As video compression resulting 

15 in DCT images becomes more common, it becomes more desirable to do picture processing on 
compressed image data without completely decoding or decompressing the image data. Some 
techniques in this regard are discussed in further detail in B.C. Smith & L.A. Rowe, "Algorithms 
for Manipulating Compressed Images," IEEE Computer Graphics & Applications, pp. 34-42, 
September 1993; S-F Chang & D.G. Messerschmitt, "A New Approach to Decoding and 

20 Compositing Motion-Compensated DCT-Based Images," ICASSP-93, pp. V421-V424, 1993; and 
N. Merhav & V. Bhaskaran, "A Transform Domain Approach to Spatial Domain Image Scaling," 
ICASSP-96, pp. 2405-2409, 1 996. 

Frequency Domain Processing of DCT Images 

In the present invention, compressed pictures are processed in the DCT domain with a 
25 technique based on representing the masking function in terms of the DCT basis functions and 
computing the masking as a weighted sum of the results of masking by the DCT basis functions, as 
described in further detail below. Such DCT domain processing makes it possible to reduce both 
the computational complexity and the latency of the processing, by eliminating the need for 
transforming signals from the DCT domain into the spatial domain and back. 
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As will be appreciated, if the desired processing (i.e.. masking and compositing) is done in 
the DCT domain, the three IDCT transforms 120, 12M22, and the DCT transform 130, required 
in spatial domain processing of DCT images can be eliminated. Referring now to Fig. 2, there is 
shown a block diagram of a DCT domain image processing system 200, in accordance with a 
5 preferred embodiment of the present invention. As shown, system 200 comprises DCT domain 
processor 210, but does not comprise nor require the three IDCT transforms and one DCT transform 
used in spatial domain processing. Instead, DCT domain processor 210 operates in the DCT 
domain, and is thus able to provide processing efficiencies relative to spatial domain processing. 

In one embodiment, system 200 operates with respect to two-dimensional (2-D) type-II DCT 
10 of 8x8 blocks, such as is used by the image and video compression standards JPEG, MPEG-1, 
MPEG-2, H.261 and H.263. As will be appreciated, however, in alternative embodiments the 
present invention may be utilized with other types of DCTs and other block sizes. 

The 8x8 type-II DCT is given by 

X[kJ] = t#M/]£ E*[m,«]cos(^(2m + W <*s(-^(2« + 1)/) (4) 

where T| is a frequency-dependent DCT normalization coefficient which depends on the values of 
1 5 DCT domain indices K I. It should be noted that even though the 2-D DCT can be used to represent 
non-separable signals, the transform itself is separable— and the basis functions of the 2-D DCT are 
separable. DCT basis functions are discussed in further detail below, with reference to (15). 

The 2-D DCT of each block can be implemented using matrix multiplications 

X = CXC T , (5) 

where X and X are matrix representations of Xfcl] and x[m t n] y respectively, and C is the DCT 

20 transformation matrix (a unitary matrix, i.e., CC T =I). For the 8x8 DCT, the matrices X, X and C 
are all 8x8. 

Vertical spatial masking by any window vjmj and horizontal masking by any window v t [n] 
can be implemented as the matrix multiplication 
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h, - W (6) 

where 

V k = diag(v k [n}) (7) 
Based on the fact that C is unitary, it can be derived that 

Y Ki - CY KI C T (8) 

= CV k XV,C T (9) 

= CV k C T CXC T CV,C T (10) 

= V k XV, (11) 

where 

V.^CVjC 7 . (12) 

It should be noted that non-separable masking cannot be expressed in a simple matrix 
multiplication form similar to (1 1). However, a non-separable mask can be transformed by the 2- 
DCT, which does have separable basis functions. The IDCT of the DCT domain representation of 
the non-separable mask, W[Kl] 9 is given by 

vK"] = E E »W1 t#] cos(-^(2m + l)A)tl[/]cos(-JL(2n + 1)0 (13) 

JM> /=0 10 

Substituting (13) into (1) gives 

y[m,n] = EE {(WW] mm) v k lm]x[m 9 n] vfn]} (14) 



where 



10 v k [m) = cos(-^(2m + 1)*) (15) 

16 
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This implies that masking x[m,n] with wfm,nj is equivalent to a weighted sum of the 
masking oix[m t n] with the basis functions of the IDCT. It should be noted that the windowing in 
(14) is separable and can, therefore, be written in terms of the matrix multiplications in (6), where 
both v k [m] and vj^nj are given by (15) for kj 6 {0,1,.. .,7}. Therefore, the DCT transform of the 
5 masked signal becomes 

y = EE|w]#M'])^^l (i6) 

k=o /=o 

where the numerical values of the DCT domain windowing matrices, W p can be evaluated according 
to (12). 

Thus, a non-separable mask can be implemented as weighted sum of separable functions 
(13), and masking can be accomplished with separable functions using matrix notations (11). 
10 Therefore, non-separable masks can be implemented as a weighted sum of separable masking 
operations (16). The separable masking operations in (16), by the DCT basis functions, then turn 
out to have simple and efficient implementation. 

As will be appreciated, the functions defined in (15) are the DCT basis functions (for 1-D 
type-II DCT of size N=8). The DCT basis functions form an orthogonal basis that can represent all 
15 discrete functions of length N: The factor r\ normalizes the basis functions so r\[k] times the basis 
function in (15) (i.e.,r|[A]v A [m]) forms an orthonormal (normalized orthogonal) basis for all 
functions of length N. Since the basis functions for the 2-D DCT are formed as the product of two 
1-D basis functions v k [m], V/[n], the 2-D DCT basis functions are separable. 

The windowing matrices for the DCT basis functions, Vj, are sparse and have very regular 
20 structure. For example, for j =4 we have 
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0 0 0 0 yjl 0 



4 2 



OOOIO 

0 0 10 0 

0 10 0 0 

y/2 0 0 0 0 

0 10 0 0 

0 0 10 0 

0 0 0 1 0 



1 

0 
0 

0 
0 
0 



0 
0 

1 

0 
0 
0 



0 
0 
0 

1 

0 

-1 



(17) 



-1 0 
0 0 



and the other windowing matrices have the same kind of structure. 

By incorporating the factor of 'A into the windowing function W[k,l], each matrix 
multiplication in (16) can be implemented using only one addition per sample and two 
multiplications (by ft) per 64 samples (for 8x8 DCT). If the DCT coefficients are obtained from 

5 decoding JPEG or MPEG streams, the multiplications by^ can be incorporated into the 

quantization matrices, reducing the computational complexity to only one addition per sample for 
each matrix multiplication in (16). In addition, there is one multiplication and one addition per 
pixel for each term in the weighted sum in (16). Therefore, the computational complexity of 
implementing masking according to (16) is approximately one multiplication and three additions 
10 per pixel for each term that is evaluated. Additionally, when the weighting coefficient, Wfk,lJ, is 
zero, the whole term can be dropped and no computation is needed for that term. 

As will be appreciated, the DCT approach is used in compression systems, such as JPEG 
and MPEG, because for most signals the energy is concentrated into relatively few DCT 
coefficients. In the present invention, this property is utilized to save computations by skipping all 

1 5 processing for weighting coefficients, W[k,l], equal to zero. Alternatively, the savings can be made 
more substantial by dropping weighting coefficients close to zero. As will be appreciated, when the 
weight is zero or close to zero, terms can be dropped from the sum, which reduces the 
computational complexity. That is, the representation of the masking in terms of the weighted sum 
allows computational complexity to be reduced by skipping all processing for weighting 

20 coefficients W[k,lJ equal to zero (or, in one embodiment, for all weighting coefficients W[k,l] less 
than a predetermined threshold). 
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As will be appreciated, by adjusting the threshold for choosing which coefficients are 
dropped, the quality of the masking operation can be traded for computational complexity in a 
similar manner as quality is traded for bit rate in encoding. The ability to trade off quality of the 
masking against computational complexity gives great flexibility in trading cost for quality. 
5 Accordingly, the frequency domain implementation of picture masking and compositing of the 
present invention can be very efficient. 

Thus, as will be appreciated by those skilled in the art, in the present invention, the 
masking function is implemented in terms of the DCT basis functions. As will be appreciated, any 
necessary scaling is first performed, and may be incorporated into the quantization matrix in an 
10 inverse quantization. Next, a weighted sum of the blocks masked in this fashion is then 
implemented. In one embodiment, the masked block at this point is re-normalized, in accordance 
with the scaling done previously. (As will be appreciated, the initial scaling and re-normalization 
scaling may be incorporated into the quantization matrix if the input signal is dequantized and the 
output signal is quantized.) 

15 In one embodiment, all processing for weighting coefficients W[k,l] equal to zero is skipped 

(where, for an input picture x[m,n] y w[m,n] is the window used to mask the input picture, and 
W/X,// is the frequency representation of w[m,nj). In an alternative embodiment, because of the 
aforementioned energy compaction property of the DCT, all processing is skipped for weighting 
coefficients W[k,l] close to zero. In general, all processing is skipped for weighting coefficients 

20 W[k,l] less than or equal a predetermined threshold value (where a threshold value of zero yields 
the former case). In one embodiment of the present invention, this threshold is selected in 
accordance with a desired tradeoff between the quality of the masking operation and computational 
complexity, where a higher threshold provides lower quality but greater savings in computational 
complexity, and vice-versa. 

25 Compositing of two images can be implemented by use of masking, according to (3). 

For a given block of the image (e.g., an 8-by-8 block), for example, the following steps may 
be taken by a suitably programmed processor to implement the present invention, in one 
embodiment. First, examine every DCT coefficient of the mask, W[k,l] y and if the coefficient is 
"relevant" (i.e., either bigger than zero or bigger than a given threshold, depending on the 
30 embodiment), then do masking by the corresponding basis function, multiply each coefficient of the 
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masked signal by the weighting coefficient, and add the result to the weighted sum for the block. 
This implements the weighted sum in (16). 

The masking by the DCT basis functions can be implemented in terms of matrix 
multiplications as shown in (1 1) (and (16)). However, a more efficient implementation can be 
5 achieved by taking into account the regular structure of the windowing matrices as the example in 
(17) shows. Several of these more efficient implementations are discussed in the detailed 
description above. 

For processing of original signals already in the DCT domain, the frequency domain 
processing of the present invention requires less computation than both spatial domain processing 

10 and brute force DCT domain processing based on symmetric convolution. Through empirical 
testing and modeling, the inventors have found that the computational complexity involved in using 
the frequency domain processing of the present invention is approximately one to four 
multiplications per sample for most typical masking operations. As will be appreciated, by using 
an algorithm similar to rate control algorithms, the complexity of spatial masking in the DCT 

1 5 domain can be limited to only three multiplications per sample without any noticeable degradation 
of the masking quality. 

By contrast, a single 2-D DCT takes about three multiplications per sample, and when 
implementing masking of JPEG or MPEG compressed pictures in the spatial domain, lDCTs must 
be first used to transform the DCT data into the spatial domain, and then use the DCT operation to 

20 transform the processed picture back into the DCT or frequency domain. Thus, when implementing 
picture compositing, there are at least two IDCTs (one for each input picture) and one DCT (for the 
composite picture) needed, in addition to the spatial processing. Therefore, there are at least nine 
multiplications needed for implementing picture compositing in the spatial domain, which is 
approximately three times what is needed for described embodiments of the DCT domain 

25 implementation of the present invention. In sum, therefore, for picture compositing of pictures in 
the DCT domain, the present invention, in one embodiment, requires about three times fewer 
multiplications per pixel than spatial domain processing, and about twenty times fewer 
multiplications than processing based on brute force convolution. 

Referring now to Fig. 3, there is depicted an exemplary image 300 processed using DCT 
30 domain picture compositing performed in the frequency domain by image processing system -00. 
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Image 300 contains a head-and-shoulder portion 312, which is overlaid over a flower garden 
background 3 1 0, and a transparent logo "S ARNOFF' 3 1 5, which was inserted in the top right hand 
comer of image 300. The picture compositing performed by system 200 to arrive at image 300 was 
performed, in one actual experiment, using only 1 .8 multiplications per pixel. 

As will be appreciated, although the embodiments of the present invention described above 
is implemented with respect to the DCT frequency domain, the present invention is also potentially 
applicable to other frequency domains in which the masking function may be represented in terms 
of the frequency domain's basis functions and in which the masking can then be computed as a 
weighted sum of the results of masking by these basis functions. For example, the present invention 
may be applicable to other frequency domains such as the DFT and discrete sine transform (DST). 

As will be understood, the present invention can be embodied in the form of computer- 
implemented processes and apparatuses for practicing those processes. The present invention can 
also be embodied in the form of computer program code embodied in tangible media, such as 
floppy diskettes, CD-ROMs, hard drives, or any other computer-readable storage medium, wherein, 
when the computer program code is loaded into and executed by a computer, the computer becomes 
an apparatus for practicing the invention. The present invention can also be embodied in the form 
of computer program code, for example, whether stored in a storage medium, loaded into and/or 
executed by a computer, or transmitted over some transmission medium, such as over electrical 
wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the 
computer program code is loaded into and executed by a computer, the computer becomes an 
apparatus for practicing the invention. When implemented on a general-purpose microprocessor, 
the computer program code segments configure the microprocessor to create specific logic circuits. 

It will be understood that various changes in the details, materials, and arrangements of the 
parts which have been described and illustrated above in order to explain the nature of this 
invention may be made by those skilled in the art without departing from the principle and scope 
of the invention as recited in the following claims. 
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CLAIMS 

What is claimed is: 



1 1 . A method for processing image signals, comprising the steps of: 

2 (a) receiving at least one image signal and a mask signal, wherein the image signal and 

3 mask signal are in a frequency domain; and 

4 (b) performing masking of the image signal in the frequency domain, in accordance with the 

5 mask signal, by representing the masking in terms of the basis functions of the 

6 frequency domain, to provide an output image signal. 

1 2. The method of claim 1 , wherein the frequency domain is the discrete cosine transform 

2 (DCT) domain. 

1 3. The method of claim 2, wherein step (b) comprises the steps of: 

2 (1) masking blocks of the image signal with the DCT basis functions to provide masked 

3 blocks; and 

4 (2) performing a weighted sum of the masked blocks. 

1 4. The method of claim 2, wherein the image signal is divided into 8x8 blocks and the DCT 

2 is a two-dimensional type-II DCT. 

1 5. The method of claim 1, wherein step (a) comprises the step of receiving first and second 

2 image signals, the method further comprising the step of compositing the first and second image 

3 signals in accordance with the mask signal. 

1 6. The method of claim 1 , wherein step (b) comprises the step of skipping all processing for 

2 weighting coefficients W[kl] equal to zero, wherein the image signal is represented by x[m,n]> the 

3 mask signal is represented by a window w[m t n] 9 and W/%(/ is the frequency representation of 

4 w[m t nj. 

1 7. The method of claim 1 , wherein step (b) comprises the step of skipping all processing for 

2 weighting coefficients Wfci] less than or equal to a specified threshold, wherein the image signal 
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3 is represented by xfm,nj, the mask signal is represented by a window w[m,rtj, and W/*,// is the 

4 frequency representation ofwfm,nJ. 

1 8. The method of claim 7, further comprising the step of selecting the threshold in 

2 accordance with a desired tradeoff between the quality of the masking of step (b) and the 

3 computational complexity required to perform the masking of step (b), wherein a higher threshold 

4 provides lower masking quality but smaller computational complexity, and vice-versa. 

5 9. An apparatus for processing image signals, the apparatus comprising: 

6 (a) means for receiving at least one image signal and a mask signal, wherein the image 

7 signal and mask signal are in a frequency domain; and 

8 (b) means for performing masking of the image signal in the frequency domain, in 

9 accordance with the mask signal, by representing the masking in terms of the basis 
1 0 functions of the frequency domain, to provide an output image signal. 

1 1 0. The apparatus of claim 9, wherein the frequency domain is the discrete cosine transform 

2 (DCT) domain. 

1 11. The apparatus of claim 1 0, wherein means (b) comprises: 

2 (I ) means for masking blocks of the image signal with the DCT basis functions to provide 

3 masked blocks; and 

4 (2) means for performing a weighted sum of the masked blocks. 

5 12. The apparatus of claim 10, wherein the image signal is divided into 8x8 blocks and the 

6 DCT is a two-dimensional type-II DCT. 

7 1 3. The apparatus of claim 9, wherein means (a) comprises means for receiving first and 

8 second image signals, the apparatus further comprising means for compositing the first and second 

9 image signals in accordance with the mask signal. 

1 14. The apparatus of claim 9, wherein means (b) comprises means for skipping all 

2 processing for weighting coefficients W[k,l] equal to zero, wherein the image signal is represented 
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3 by x[m t nj 9 the mask signal is represented by a window w[m,n], and W/ifc// is the frequency t 

4 representation of wfm.n]. 

5 15. The apparatus of claim 9, wherein means (b) comprises means for skipping all 

6 processing for weighting coefficients W[kX] less than or equal to a specified threshold, wherein the 

7 image signal is represented by x[m,n] y the mask signal is represented by a window w[m,n], and 

8 W/X;// is the frequency representation of w[m,nj. 

1 16. The apparatus of claim 15, further comprising means for selecting the threshold in 

2 accordance with a desired tradeoff between the quality of the masking of step (b) and the 

3 computational complexity required to perform the masking of step (b), wherein a higher threshold 

4 provides lower masking quality but smaller computational complexity, and vice-versa. 

1 1 7. A storage medium having stored thereon a plurality of instructions for processing image 

2 signals, wherein the plurality of instructions, when executed by a processor, cause the processor to 

3 perform the steps of: 

4 (a) receiving at least one image signal and a mask signal, wherein the image signal and 

5 mask signal are in a frequency domain; and 

6 (b) performing masking of the image signal in the frequency domain, in accordance with the 

7 mask signal, by representing the masking in terms of the basis functions of the 

8 frequency domain, to provide an output image signal. 

1 18. The storage medium of claim 17, wherein: 

2 the frequency domain is the discrete cosine transform (DCT) domain; and 

3 step (b) comprises the steps of: 

4 (1) masking blocks of the image signal with the DCT basis functions to provide 

5 masked blocks; and 

6 (2) performing a weighted sum of the masked blocks. 

1 19. The storage medium of claim 17, wherein step (b) comprises the step of skipping all 

2 processing for weighting coefficients W[k>l] less than or equal to a specified threshold, wherein the 
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3 image signal is represented by x[m,n] y the mask signal is represented by a window w[m,n] 9 and 

4 W/£// is the frequency representation of \v[m,n]. 



1 20. The storage medium of claim 1 9, further comprising the step of selecting the threshold 

2 in accordance with a desired tradeoff between the quality of the masking of step (b) and the 

3 computational complexity required to perform the masking of step (b), wherein a higher threshold 

4 provides lower masking quality but smaller computational complexity, and vice-versa. 
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